Robust small area estimation for unit level model with density power divergence

Unit level model is one of the classical models in small area estimation, which plays an important role with unit information data. Empirical Bayesian(EB) estimation, as the optimal estimation under normal assumption, is the most commonly used parameter estimation method in unit level model. However, this kind of method is sensitive to outliers, and EB estimation will lead to considerable inflation of the mean square error(MSE) when there are outliers in the responses yij. In this study, we propose a robust estimation method for the unit-level model with outliers based on the minimum density power divergence. Firstly, by introducing the minimum density power divergence function, we give the estimation equation of the parameters of the unit level model, and obtain the asymptotic distribution of the robust parameters. Considering the existence of tuning parameters in the robust estimator, an optimal parameter selection algorithm is proposed. Secondly, empirical Bayesian predictors of unit and area mean in finite populations are given, and the MSE of the proposed robust estimators of small area means is given by bootstrap method. Finally, we verify the superior performance of our proposed method through simulation data and real data. Through comparison, our proposed method can can solve the outlier situation better.


Introduction
In sampling estimation, due to the challenge of small sample or even no sample, small area estimation(SAE) has received unanimous favor from statisticians [1][2][3][4].Compared with the traditional direct estimation method, the small area estimation can solve the small sample problem better by "borrowing strength" from the auxiliary information.In practice, the small area estimation method is widely used in population statistics [1], medical statistics [2], agricultural statistics [5,6], poverty rate estimation [3,7] and other fields.In terms of theoretical research, the theory of small area estimation has also been fully developed, forming a relatively complete theoretical system, [8] describe the basic theory of small area; [9] introduces the theory of small area estimation of several kinds of mixed models.For comprehensive overviews of small area estimation, see [8][9][10][11].
Among the SAE methods, the model-based SAE method has received more attention from statisticians.Although direct estimation can give an unbiased estimator of the target variable, due to the small sample size, direct estimator using only the original data is not reliable.The small sample size of raw data can be overcome by statistical models using auxiliary variables.As one of the basic models of small area estimation, unit level model can deal with the estimation of target variables at each unit level in a small area and calculate the corresponding area level values from unit data.Due to the limitations of data collection, acquisition of auxiliary variables, and model calculation, unit level model is not as concerned by scholars as area level model.If observation data and auxiliary information can be obtained at the unit level, the establishment of the unit level model is a better choice for SAE.[5] use the nested error regression(NER) model to estimate crop area at county level based on sampling data and satellite data.[12] generalizes the NER model and discusses the estimators under the generalized linear mixed model by using the hierarchical Bayes(HB) method.Empirical best linear unbiased estimator (EBLUP) is the most widely used method in model-based SAE, which can solve the estimation problem of mixed linear models well.When the observed variables are dichotomous variables or counting variables, the empirical Bayes (EB) method is more widely used, for example, [1,2] have been mentioned.In the basic unit level model, the errors of individual and area-specific random effect are assumed to follow the normal distribution.However, [13] points out that this assumption is difficult to be verified in practice, which means that the traditional EB method is not reliable in the presence of outliers.Meanwhile, in practical application, due to the small sample size, outliers are are very common, which will cause large estimation deviation in the traditional estimation method.In this paper, we focus on unit level models whose random effects have different skewed distributions or observations in the model have outliers, and propose robust estimation methods for such models.
The existence and influence of outliers in sampling estimation have been studied for a long time.[14] mentioned that outliers are a fact of life for any survey.[15] discussed how outliers can affect shrinkage estimators, even a single outlier may lead all the small area estimates to collapse to their corresponding direct estimates.At present, there are several common methods to deal with outlier observations [14,[16][17][18].In the first method, the outliers are deleted directly and the remaining observations are used for estimation and prediction.Obviously, this method may be feasible to delete a few outlier observations when the sample size is very large.However, when the sample size is small, the method will not only cause the loss of information, but also lead to the deviation of the estimation from the real value.In the second method, non-outlier observations are used instead of outliers for estimation, and robust projection is used for robust prediction in the unsampled population.However, [16] states that an observation cannot be considered unique if it is accurately captured, and there is no reason to think that outlier observations cannot be included in an unsampled population.[14] have used robust projection method to construct robust small area estimator, and obtained MSE estimation for robust predictors.[3] proposed using global-local shrinkage priors for modeling random effects that allow potential outliers in the areal effect.In the other method, the influence of outliers on estimation results can be reduced by constructing robust estimators insensitive to outlier observations, which is also the main method concerned by scholars.
The early research on robust estimation of SAE can be traced back to the mean robust estimators of strata means, which proposed by [19], with small area means being a special case.[15] used a hierarchical Bayes (HB) framework to study the effect of outliers in v i .The HB estimators based on long-tailed distributions (such as, t and Cauchy) are more robust to outlying area-specific error v i than the estimators based on normal distribution.[20] research results indicate that the use of a t-distribution with small k can diminish the effect of outliers in the sense that more weight is given to the direct estimator b θ i than in the case of normal area-specific error.However, these two methods only consider the robust estimation in special cases, and do not discuss the properties of the estimation parameter.[17] introduced robust SAE based on quantiles, which used M-quantiles to offer an alternative to the modeling of between area variation.[21] proposed a robust Bayes predictor for the FH model that can overcome the over-shrink caused by outlier observations, but the influence of outliers on the model coefficients is not taken into account.[18] developed robust EBLUP(REBLUP) in general linear mixed models, which they used Huber's ϕ-function to modify certain "residual" terms by down-weighting contributions due to the outliers.By using a parametric bootstrap procedure, they have also developed estimators of the MSEs of the REBLUPs.Up to now, this method has been widely used in small area robust estimation.Later researches on robust SAE are completed on the basis of the above research.Such as, robust SAE in business surveys is discussed by using robust projection and M-quantile method in [22,23] studied the robust estimation of nested error linear regression model by using huber's ϕ-function and M-quantile based on hierarchical bayes theory and given prior information; The robust SAE of generalized linear models is discussed in [24,25] reviewed the robust estimation of small area with outliers, and proposed Bootstrap MSE based on M-quantile estimators.[26] provide an overview of robust small area estimation.Readers who are interested in this research can refer to [22][23][24][25][26][27].In this paper, we propose a new robust Bayes estimator using dendity power divergence, and investigate the proposed estimator's MSE and parameter estimation.
In the field of statistical inference, when outliers appear in the data, the density-based minimum distance is an effective method to solve this problem.The density power divergence (DPD) [4], which measures the discrepancy between two density functions, has been successfully used to build a robust estimator for independent and identically distributed observations.Since then, the method by minimzing DPD(MDPD) has been one of the most powerful tools in robust estimation.[28] extended the construction of the DPD and the corresponding minimum DPD estimator (MDPDE) to the case of independent but non-identically distributed data.The main idea of the DPD is to give small weight to the terms related to outliers, and then, the parameter estimation becomes robust against outliers.In the cases, the parameter α controls the trade-off between robustness and efficiency.The smaller the value of α is, the more efficient the model is; The higher the value of α is, the better the stability of the outlier is.The minimum divergence estimator corresponding to α = 0 is the maximum likehood estimator(MLE).[29] used density power divergence to study robust Bayesian estimators, and discussed the asymptotic properties of the estimators and parameters.[30] discussed the theory of MDPD method in robust regression using the methods of S-estimation.This result showed that robust estimation based on MDPD method and Huber-ϕ function had similar effects.[31] employed the γ-divergence (similar to density power divergence) for the Fay-Herriot model and discussed empirical Bayes confidence intervals rather than MSE.Therefore, in this paper, we apply the MDPD method to the unit level model and compare it with the robust estimation proposed by [18].
In this paper, our main work is reflected in the following aspects.Firstly, the MDPD method is applied to the basic unit level model with outliers, and robust estimates of unknown parameters in the model are obtained.The asymptotic properties of the estimated parameters are further derived.Secondly, the selection algorithm of adjustment parameters in estimators is proposed to select the optimal tuning parameters.Thirdly, combined with parameter estimators, the general expression of robust small area estimator of unit level model is proposed.Fourthly, Bootstrap method is used to calculate the MSE of robust estimators, and the algorithm is given.Finally, the maximum likelihood estimator, the robust estimator in [18] and the proposed robust estimator are compared between simulated data and actual data to illustrate the efficiency and robustness properties of the estimators.
The rest of this paper is organized as follows.In Section 2, we introduce the basic unit level model and some notations used.In Section 3, the background and definition of MDPD method are reviewed, and the robust estimation equations of model parameters are obtained by applying MDPD method to unit-level model.The asymptotic properties of robust parameters obtained by MDPD method are presented in Section 4. In Section 5, firstly, we propose robust empirical Bayes predicator based on MDPDE, secondly, we give an algorithm to select the optimum tuning parameter, and finally, we give the algorithm procedure to estimate MSE of robust EBP using Bootstrap method.In Section 6, we investigate performances of the proposed estimator by simulation and real data.The proposed method is compared with the robust method of [18] under different outlier generation backgrounds.

Basic unit level model
The NER model is a popular basic unit level model proposed by [5].Suppose a finite survey population is partitioned into m small areas, with the i-th area having N i units such that P m i¼1 N i ¼ N. We assume that y ij is the value of a response variable Y for the j th unit in the ith small area.The unit-specific auxiliary data x ij = (x ij1 , . .., x ijp ) T are available for each population element j(j = 1, . .., N i ) in each small area i.Then, the NER model described as Where β is a p-variate vector of unknown regression coefficients, the area-specific random effects v i are assumed to be independent Nð0; s 2 v Þ.The unit errors e ij ¼ k ij ẽij for known constants k ij and the ẽij 's are iid random variables independent of the v i 's, with Nð0; s 2 e Þ.We assume that a sample s i of size n i is taken from the N i units in the i th area (i = 1, . .., m), and that the sample values also obey the assumed model (1).The latter assumption is satisfied under simple random sampling from each area or more generally for sampling designs that use the auxiliary information x ij in the selection of the samples s i .To see this, we write (1) in matrix form as where x i is a n i × p matrix, y i and e i are n i × 1 vectors, and 1 i is the n i × 1 vector of ones.
denote the unknown parameter of the model given in Eq (1).Under the assumption of normality of the model, y i jx i obeys normal distribution.The conditional probability density is where The matrix V i can be inverted explicitly.Using the Sherman-Morrison formula: and denoting where 3 Density power divergence

Minimum density power divergence estimator
The density power divergence (DPD) measure was developed by [4] in terms of a tuning parameter γ.The DPD measure between the model density f θ and the true density g is defined as where γ is a tuning parameter.Note that, G is not necessarily a member of the model family F θ .Further, for γ = 0, the DPD measure is obtained as a limiting case of γ !0 + , and is same as the Kullback-Leibler (KL) divergence.Generally, given a parametric model, we estimate θ by minimizing the DPD measure with respect to θ over its parametric space Θ.We call the estimator the minimum power divergence estimator (MDPDE).It is well-known that, for γ = 0, minimization of the KL-divergent is equivalent to maximization of the log-likelihood function.Thus, the MLE can be considered as a special case of the MDPDE when γ = 0. We substitute the conditional density f θ (y i j X i ) in 3 as the model density into the definition of DPD, and define the DPD measure based on SAE model as where h(x) is the marginal probability density function of X and g(yjx) is the true conditional density of Y given X For γ > 0, after approximating the true distribution with the empirical, the DPD measure turns out to be where the last part of the expression in the right hand side of Eq (2) cðgÞ ¼ Hence, Eq (5) simplifies to where B i ¼ ðy i À x i bÞ T V À 1 i ðy i À x i bÞ.Using the formula (4), we get The MDPDE of θ is obtained by minimizing b d g ðf θ ; gÞ over θ 2 Θ, where Θ is the parameter space composed by all possible parameters θ.Obviously, if the i-th observation is an outlier, then the value of the conditional density f θ (y i j x i ) is smaller compared to other observations.In this way, the second term of Eq ( 5) is negligible when γ > 0, thus the corresponding MDPDE becomes robust against outlier.The tuning parameter γ controls the trade-off between efficiency and robustness of MDPDE.When γ increases, robustness increases and efficiency decreases, and vice versa.In addition, when γ = 0, the DPD becomes KL divergence, and MDPDE becomes MLE.At this time, for an outlying observation, the KL divergenence measure diverges as f θ (y i j x i ) ! 0, and MLE method is invalid.
The partial derivative of b d g ðf θ ; gÞ with respect to θ ¼ 5) is taken to obtain the following estimated equation:

Choice of the optimal tuning parameter γ
It can be seen from the above results that the unknown tuning parameter γ is included in the estimated expression of parameter θ which is iterated by MDPD method.The choice of γ determines the trade-off between robustness and statistical efficiency [4].When the value of γ is closer to 1, the estimated parameter has stronger robustness; otherwise, the weaker the robustness, the stronger the efficiency.Therefore, choosing the appropriate tuning parameter γ is the key factor in robust estimation.we hope to choose a data-driven value of γ in an optimal way which balances the concerns of robustness and efficiency.At present, there are two main methods for selecting tuning parameters.One is based on the proportion relationship between efficiency and robustness.Researchers determine the proportion between them according to their own needs, and then select the optimal tuning parameters.For example, [29] selects tuning parameters when using DPD method for area level estimation.Another method is based on data-driven parameter selection method, [32] minimizes the MSE of the estimated parameter to get the optimal tuning parameter, but this method depends on the selection of initial value, different pilot value of parameter may result in different tuning parameters.[33] Based on [32], a method of adjusting parameter selection that does not depend on the initial value is proposed.In this paper, we will use the parameter selection method mentioned in [33] to select the optimal tuning parameters for constructing robust small area estimators.
For the true value θ* of the unknown parameter θ, the optimal tuning parameter γ is obtained by minimizing the summed MSE of the MDPD estimator, i.e As the unknown parameter θ* is contained in the formula (10), there are usually two ways to select the optimal tuning parameter γ.One method is to think that the estimated b y g is the true value of parameter θ*, that is, the optimal tuning parameter can be obtained only by solving the minimum value of the first term of the above equation.This method is easy to use, but we know that b y g is asymptotically tend to θ, so direct substitution will produce a large error.Another method is to set an initial value θ p of θ* and then minimize (10) to find the optimal parameter.This method is used in [32], and wick-Jones (WJ) algorithm is given to select the optimal tuning parameters.However, the WJ algorithm relies heavily on the selection of the initial θ p , which directly determines the selection of the optimal parameter.In order to overcome the shortcomings of the above two selection methods, [33] proposed iterative WJ (IWJ) algorithm, which is used to calculate the optimal tuning parameters.In this paper, this method is also used to select the optimal tuning parameters.The specific algorithm steps are as follows:

Asymptotic distribution of the robust estimator
In this section, we investigate the asymptotic distribution of the robust estimator of model parameters, when the data generating distribution G(y j x) is not necessarily in the model famliy.Let's define the score function as u θ ðy i j According to the definition of score function and (3), we can get We further define J ¼ lim m!1 For the asymptotic distribution of the MDPDE, we need the following assumptions: 1.The true density g(y j x) is supported over the entire real line R; There is an open subset ω 2 Θ 0 containing the best fitting parameter θ such tat J is positive definite for all θ 2 ω; 3. There exist functions M jkl (x, y) such that j@ 3 exp ½ðy À xβÞ Proof: The proof of the theorem is given in S1 File.
Note that, if the true distribution g(y j x) is a member of the model family f θ (yjx) for some θ 2 Θ, then y u θ ðy j x i Þf gþ1 θ ðy j x i Þdy: ð13Þ In this case, the symmetric matrix J (i) can be partitioned as Combining Eqs ( 12) and ( 13), the elements in J (i) can be deduced.Detailed calculation and results can be found in S1 File.
Similarly, ξ (i) can be partitioned as ξ ðiÞ ¼ ðξ ðiÞT b ; ξ ðiÞ T , In S1 File, the derivation formula of the components of ξ (i) is given.
Note that if we write the matrix J (i) as a function of γ, i.e.J (i) � J (i) (γ), Based on the representation of K (i) and J (i) in ( 13), we have K ðiÞ ¼ J ðiÞ ð2gÞ À ξ ðiÞ ξ ðiÞT : Therefore, K can be written as Through the calculation of the above covariance matrix, it can be seen that the parameter variance increases with the increase of γ, which indicates that the efficiency of MDPDE decreases with the increase of γ.This further verifies that the tuning parameter γ is used to control the trade-off between efficiency and robustness of MDPDE, and that robustness increases and efficiency decreases as γ increases However, our subsequent simulations show that this loss of efficiency is not significant.

Robust EB predictor under a finite population
In this section,we discuss EB estimators of parameters of a finite population.A finite population P contains N units and a sample s of size n is drawn from P. We denote by y P the unit values vector of the target variable in the population, which is assumed to be random with a given joint probability distribution.We write y s as the subvector of y P composed of sampling units, y r as the subvector composed of unsampled units and assume without loss of generality that the first n units of y are the sample elements, that is, We assume that the vaule y ij of a target variable for jth unit in ith area follows the basic unit level model (1).At the same time, we assume that y i obeys normal distribution under the condition of auxiliary information X i , i.e y i * N(X i β, V i ).We next partition (2) into sampled and nonsampled parts:

� � ;
where the subscript r denotes the nonsampled units.The covriance matrix can be decomposited as: where The non-sampled sub-vectors y ir follow the marginal models derived from the population model (1), i.e.
The vectors e ir are independent with e ir � Nð0 The vectors y ir are independent and normally distributed with where μ ir = X ir β.
The distribution of y ir , given the sample data y is , is The conditional mean vector is and the conditional covariance matrix is where a ir ¼ ða in i þ1 ; . . .; If n i 6 ¼ 0 and j 2 U i − s i , the conditional mean is For any j 2 U i − s i , the conditional variance is In general, our goal is to use the available sample data y s to estimate the value of the real measurable function τ = h(y p ) with respect to the population vector y p .Therefore, The conditional distribution of y r , given y s , plays an important role in the calculations of the best predictors (BPs) of population parameters τ = h(y).Assume that the model parameters Under the unit level model, the BP is an unbiased predictor b t ¼ b hðy s Þ of τ that minimizes the MSE.According to [8], the BP of τ is b t bp ¼ E y r ½t j y s �.
Therefore, the EB estimator of τ can be obtained by using formula ( 15)-( 17), In practice, the model , and then the variables b y r are generated from (15), thus, the EBLUP of b t ¼ b hðy P Þ can be obtained.

EBLUP of area means
In this section, we derives the EBLUPs of Y i , where Y i ¼ 1 and b s 2 e be consistent estimators of the model parameters β; s 2 v , and s 2 e , respectively.Under the conditioned distribution (15), the predicted values are

MSE of the EBP
It is usually difficult to estimate the mean square prediction error(MSPE) of unit level model, which is caused by two reasons.First, the true distribution of error terms and non-sampled units in the unit level model is unknown, so it is impossible to obtain the density function for MSPE calculation.Second, even when the distribution of unsampled units is known, MSPE calculations are sometimes challenged by multiple integrals in the calculation of expectations due to the fact that model involve unit level data.In this paper, we use the parameter Bootstrap method mentioned in [18,34] to estimate the MSPE.
The parametric bootstrap methods can be used to estimate the MSE of EBP b d EB i for finite populations.The method proceed as follows: 1.The robust estimation of parameters b β; b s 2 v and b s 2 e are obtained by using the DPD method mentioned in section 2;

The bootstrap MSE estimator of b t EB
i is given by

Application
In this section, we compare the effects of several robust Bayes estimators with the estimator proposed in this paper based on simulated and real data.

Simulation
6.1.1Contaminated distribution.In this paper, we do the same simulation as [18], that is, a unit level model with a single auxiliary variable x: with k = 40 and n = 4.Where auxiliary variable x ij * N(1, 1), and the area-specific random effects v i and the random errors e ij were generated from the contaminated distribution This means that a (1 − η) proportion of the errors' were generated from the underlying "true" distribution N(0, σ 2 ) and the remaining η proportion of the errors were generated from the "arbitrary" contaminated distribution Nð0; s 2 1 Þ: The choice η = 0 indicates no contamination of the distribution.For the underlying distributions, we set s 2 e ¼ s 2 v ¼ 1 and for the contaminated distributions, we set s 2 e1 ¼ s 2 v1 ¼ 25 and the proportion of contamination η 1 = η 2 = 0.10.We considered four possible combinations {(0, 0), (0, v), (0, e), (v, e)} of contamination, where (0, 0) indicates no contamination of the distributions, (0, v) indicates the contamination only in the distribution of the area-specific random effects v i , and so on.
For each simulation configuration, the regression coefficients were fixed at (β 0 , β 1 ) = (1, 1).We ran four sets of simulations, each of size 500.Given our focus on bias robustness, the main performance indicator for an MSE estimator in four studies is the relative bias, defined by Where the subscript i indexes the small areas and the subscript j indexes the S Monte Carlo simulations, with b y ij denoting the simulation j value of the estimator in area i, and Y i denotes the actual vaule in area i.We also measured the stability of an MSE estimator by its relative MSE, In the simulation experiment, we compare the traditional maximum likelihood(ML) method, the robust estimation method(RML) mentioned in [18], and the robust minimum density power divergence method(RMD) proposed by us when taking different tuning parameters(γ = 0.1, 0.2, 0.3and the optimal γ choosed by IWJ algorithm).
First we compare the estimates of model parameters under four contamination scenarios.Table 1 shows the RABiases and RAMSEs of estimators obtained from the robust estimation methods under different conditions, where the first row corresponding to each parameter represents the RABias and the second row represents the RAMSE.
The following conclusions can be clearly drawn from Table 1.In case of no contamination in the data, ML method in parameter estimation performance is best, However, RML and RMD methods with smaller tuning parameters are very similar to ML estimation results.This indicates that in this case, RMD method with smaller tuning parameters and RML method are almost as efficient as the ML method, and there is little difference between the RABias and RAMSE, while RMD method with larger tuning parameters performs poorly.It shows that the selection of tuning parameters is very important, and the optimal tuning parameters can be obtained according to the algorithm provided in Section 3.2.In the case of contamination in random effect v i , The variance σ estimated by the ML method has a large RABias and RAMSE, while the estimation by the RML method becomes smaller.However, the RMD method proposed by us is obviously better than the RML method, which has smaller bias and MSE in the estimation of all parameters.According to the simulated data in the Table 1, when tuning parameter γ was obtained by IWJ algorithm, the proposed RMD method provides better results for the estimation of all parameter.Similarly, in the case of outliers in the random errors e ij , s 2 e estimated by ML method has a large bias and MSE.RML method has a good control on the influence of outliers, and the estimated bias and MSE of each parameter are relatively small.In the proposed RMD method, when γ = 0.1, the estimation of model variance in the results is reduced, but it is not as good as the RML method.However, when γ > = 0.2, the RMD method performs better than the RML method.In the case of both area effects v i and model errors e ij are contaminated, the ML estimator of variance component is heavily influenced by the outliers and produced much larger biases and MSE, RML method reduced the effects of outliers, but performance is not the best.The proposed RMD method is significantly better than the RML method, as we only need to select appropriate tuning parameters.
Next, using the same data set used in the above simulation, we consider the estimation of small area mean.The mean of the known auxiliary variables in the ith region are X i ¼ 1; i ¼ 1; 2; . . .; m.Table 2 presents average simulated RABiases and average simulated RAMSE(averaged over the areas) of the estimators of small area means for the proposed and classical methods.The M-quantile(MQ) regression method is proposed by [17], and the area mean is where, in this case, � Y ri ¼ x ri βt i with βt i estimating β t i in q t i ðy j j x j Þ ¼ x j β t i .the bias corrected version of the REBLUP(BC-RML) method is also used to compare with other methods, and the simulation results are shown in Table 2.
In the case of uncontaminated data, EBLUP obtained using the ML method appears to be the most efficient, as expected.The REBLUP using the RML method and the proposed robust MDPDE(RMDPDE) are also seen to be almost as efficient as the EBLUP.In the other three cases with outliers, we can get a conclusion consistent with the above simulation through the data in the Table 2, the small area mean obtained by ML method has a large bias and MSPE, the small area mean obtained by RML method performs slightly better, the proposed RMD method performs strictly better than the RML method.
In order to check the performance of several existing robust methods on the size of contaminated proportion and variance of contaminated distribution, we further simulated and verified the variation of MSE of estimated parameters with contaminated proportion and variance of contaminated distribution.Just like the above simulation steps, we consider the estimation effect under the three cases where the distribution of area-specific random error is contaminated, the distribution of unit random error is contaminated, and the distribution of both unit random error and area-specific random error is contaminated.In simulating contaminated data, we use the model in section 6.1.1 to generate data, and then perform parameter estimation using the method mentioned above.The simulation was repeated 500 times and the average MSE was taken into account to plot the change curve.In the first case, the MSE performance of the estimated parameters is considered as the contaminated proportion increases.The variance of contaminated distribution was fixed at 25, and the MSE of the estimated parameters under the three contamination scenarios was considered when the contaminated proportion changed between 0 and 0.5 with a step length of 0.02.In another case, MSE performance of the estimated parameters is considered when variance of contaminated distribution increases.In the contaminated distribution, the contaminated proportion was determined to be 0.1, considering that the variance of the contaminated distribution increased by 5 steps from 5 to 100, the change of MSE of the estimated parameters under the three contamination conditions was considered.
As can be seen from the Figs 1 and 2, when the contaminated proportion of random effect increases from 0 to 0.5, the MSE of the four parameters in the small area model increases with the increase of the contaminated proportion.Among the MSE of the four parameters, except for s 2 e , the MSE of the other three parameters is not very large.As we expected, s 2 e is easily affected by contaminated proportion of e ij , and the estimated MSE is relatively large.In the comparison between the several methods, ML performed worst.When the tuning parameter γ was small(0-0.2),MDPD performed almost as well as RML.In some cases, MDPD performed better, but when the value of γ was large, RML performed better.
Combined with Figs 3 and 4, we can easily find the following conclusions.When the areaspecific random effect v i is contaminated, the MSE of b 0 ; s 2 v increases with the increase of the contaminated proportion, while the MSE of b 1 ; s 2 e is independent of the contaminated proportion, and the MSE is small.Comparing the performance of several methods for MSE estimation of four parameters, The MSE of b 1 ; s 2 e is small, and there is little difference among the methods.In the estimation of MSE of b 0 ; s 2 v , ML method performs the worst, RML method performs slightly better than ML, and several MDPD methods perform significantly better than the above two methods.It further shows that the MDPD method is effective.
When both the random error e ij and the area-specific random error v i are contaminated, the change of MSE of the estimated parameters with the contaminated proportion is shown in Figs 5 and 6.As can be seen from the figure, when the contaminated proportion increases, the MSE of parameters also increases.In comparison with several estimation methods, the proposed MDPD method is obviously superior to RML and ML method, while ML method performs the worst.
When the individual error e ij is contaminated and the variance of the contamination distribution increases, the MSE of the four parameters is presented in Figs 7 and 8.As can be seen from the figure, when the variance of contamination distribution increases, only the MSE of parameters obtained by ML method shows a significant increase trend.However, the MSE of the parameters obtained by other robust estimation methods does not increase significantly, which indicates that the robust estimation method is uniformly effective in this case.Compared with several kinds of robust methods, the proposed MDPD method is superior to RML method, especially in the estimation of the MSE of s 2 e .When the area-specific random error is contaminated and the variance of the contamination distribution varies from 0 to 100, the MSE of the parameters is shown in the Figs 9 and 10.It can be seen from the figure that the MSE of b 1 ; s 2 e is independent of the variance of contamination distribution.In the estimation of MSE of parameters b 0 ; s 2 v , ML method is seriously affected by the variance of contamination distribution.RML method has improved its performance, but not as good as MDPD method.When both individual error e ij and area-specific random error v i are contaminated, Where the contaminated proportion is 0.1 and the variance of the contamination distribution varies from 0 to 100, the MSE of the parameters is shown in Figs 11 and 12.It can be seen from the figure that ML method has the worst performance, while RML method has improved the estimation effect, but it is not good for the estimation of parameters b 0 ; s 2 v .In summary, MDPD method has good robustness for the estimation of several parameters.

Finite population area means.
In this section, we focus on the small area means Y i for finite population contains m areas and the ith area of size N i .We also use the method in 6.1.1 to simulate the performance of the population mean when m = 40 and each area is equal in size to N i = 40, 80, 200 respectively.we generated a finite population using the unit-level model ( 2) with x = (1, x) t , where x * N(1, 1).For each of the four contamination schemes used in Section 6.1.1,we then generated a series of 500 population data sets.From each population data set, ni = 4 units from N i units in the ith area were selected as a random samples.For the data set of each simulation, we can use the robust method mentioned in the above simulation to obtain the small area mean Y i for the ith area.Finally, we compare the average estimation of area mean after 500 simulations.Table 3 presents average simulated RABiases(first line) and average simulated RAMSEs (second line,averaged over the areas) of the estimators of small area means Y i .From the simulation results, in general, the proposed RMD method has smaller MSE in most cases.In some cases, it is not as good as the estimated effect of RML method, but the difference is not  significant.In addition, we find that when the area size increases, such as Ni = 200, the estimation obtained by the proposed method has a smaller MSE, and the estimation effect is better than that obtained by traditional estimation method.An interesting finding is that, when the area-specific random effects v i is contaminated, although the ML method has a large relative bias and MSE in the estimation of model parameters, the area mean obtained by the ML method has a smaller MSE.It shows that ML method is not sensitive to the variation of area-specific random effect, but is very to the variation of model random error e ij .And the method proposed in this paper has a good robust effect on the two random effects.

Real data
In this section, we use the data that is used by [5] to estimate the area under corn and soybeans for each of m = 12 counties in North-Central Iowa.This data can be obtained from the R package "sae", which contains 37 samples of areas of corn and soybeans from the 12 counties, as well as the number of pixels classified by the LANDSAT satellite as corn and soybeans for each sample segment.The unit-level model was established with the data collected from farm interviews as the dependent variable and LANDSAT satellite data as the auxiliary variable.
which is a special case of model (1) with Here, y ij is the number of hectares of corn (or soybeans), x ij1 is the number of pixels classified as corn, and x ij2 is the number of pixels classified as soybeans in the jth area segment of the ith county.
[5] identified an observation in Hardin county as an outlier, and they simply delete this observation when predicting the areas of corn and soybean.In [18], the robust estimation method is used to analyze this data, and the corresponding predicted value in the presence of outliers is given.Here, we use our proposed robust estimation method to model the data and analyze the influence of outlier on traditional estimators.
Considering the existence of outliers in the data, the robust estimation method proposed by us is considered to estimate and predicte the areas of corn in each segement.In addition, since there is only one outlier observation in this data, we select tuning parameter γ = 0.01, 0.05 for estimation in the proposed robust estimation method.In Table 4, regression coefficients and variance of random errors estimated by ML method, robust method proposed by [18] and MDPD method proposed by us are shown.The standard error for each parameter obtained from the asymptotic distribution in Section 4 is also shown in parentheses.When the tuning parameter γ = 0.01, It is clear from the table that the parameters estimated by MDPD are between those estimated by ML method and RML method.When the tuning parameter γ was increased to 0.05, the coefficients estimated by the model changed significantly.By comparing the standard errors of the parameters shown in the table, it can be seen that the parameters estimated by the proposed method have smaller standard errors.
In order to compare the estimation results, we show the predicted values of the mean hectares of corn per segment using the mldel (20).The EBLUP values obtained using the above estimation method are presented in the Table 5, where the Bootstrap estimates of the MSPE from 500 bootstrap samples are shown in parentheses.First of all, in terms of estimation results, the estimation of the region without outliers by using the proposed estimation method is closer to the result of ML estimation, and the prediction of region Hardin has been improved to some extent.Secondly, by comparing bootstrap MSPE, the MSPE values obtained by our proposed method are smaller, which shows that our proposed method is effective.

Discussion
In this paper, we propose a robust small area estimation method for unit level models with outlier observations.By introducing MDPD method, a robust estimation method with outliers and non-normal distribution errors is presented.Firstly, we have proposed an estimation equation for the parameters of the cell level model based on MDPD method and obtained the asymptotic properties of the model parameters.Secondly, combined with the asymptotic distribution of parameters, the selection procedure of optimal tuning parameter is given.Thirdly, the EBLUP values of unit and area mean in finite population is given.Finally, we verify the superior performance of our proposed method through simulated data and real data.In the simulation part, we simulate the robust estimation when the distribution is polluted, and discuss the effects of several kinds of robust estimation methods in three kinds of pollution cases.In particular, we discuss the variation of MSE of several estimation methods when the pollution ratio changes and the variance of pollution distribution changes.At the same time, the simulation results show that the proposed method can solve the outlier situation better.In the real data, we use the classical data of a small area estimation to illustrate the effectiveness of our proposed method.Through comparison, our proposed method can well deal with the special case of outlier observation.
Furthermore, it can be verified that the proposed method is also effective for random effects subject to other biased distributions.In this paper, we note that when the distribution of random errors is contaminated and the probability of contamination is greater than 0.3, the performance of our proposed estimation method is generally poor than the robust estimation method in [18], but in this case, the MSE obtained by several kinds of methods are very large, and the robust estimation results are not very valuable.In the next step, the estimation method proposed by us can also be applied to the small region estimation problem of exponential distribution.Of course, these work need to be further studied and proved.
In this paper, we use IWJ algorithm to select the optimal tuning parameters.In similar related work, the parameter selection algorithm based on Hyvarinen score is given in [35].In further research, this algorithm can be applied to the unit level model proposed in this paper, and the purpose of selecting the optimal tuning parameters can also be achieved.In addition, we only compared two classical robust estimation methods.In order to further illustrate the effectiveness of the method proposed in this paper, we can further compare the method proposed in this paper with [3,17] and other methods.

Theorem 4 . 1
dydx < 1 for all j, k and l.Under the regularity conditions (1)-(3), with probability tending to 1 as m ! 1, there exists b θ, such that 1. b θ is consistent for θ; 2. the asymptotic distribution of b θ is given by

1 g g 1þg ðy j xÞg hðxÞdxdy; for g > 0 R x R y gðy j xÞ log gðyjxÞ f θ ðyjxÞ
Repeat steps (2)-(7) a large number of times B. Let t * i ðbÞ be true value and b t EB * i ðbÞ the EB estimator obtained in b th replicate of the bootstrap procedure, b = 1, . .., B.

Table 4 . Estimates of the model parameters from several methods. Standard errors are shown in the parenthesis.
https://doi.org/10.1371/journal.pone.0288639.t004